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Abstract 

In this article, we consider the problem of simnltaneous testing of hypotheses when the 
individual test statistics are not necessarily independent. Specifically, we consider the problem 
of simultaneous testing of point null hypotheses against two-sided alternatives about the mean 
parameters of normally distributed random variables. We assume that conditionally given the 
vector means, these random variables jointly follow a multivariate normal distribution with a 
known but arbitrary covariance matrix. We consider a Bayesian framework where each unknown 
mean is modeled via a two component point mass mixture prior, whereby unconditionally 
the test statistics jointly have a mixture of multivariate normal distributions. A new testing 
procedure is developed that uses the dependence among the test statistics and works in a step 
down like manner. This procedure is general enough to be applied to even for non-normal data. 
A decision theoretic justification in favor of the proposed testing procedure has been provided 
by showing that unlike the traditional p-value based stepwise procedures, this new method 
possesses a certain convexity property which is essential for the admissibility of a multiple 
testing procedure with respect to the vector risk function. Consistent estimation of the unknown 
proportion of alternative hypotheses and variance of the distribution of the non-zero means is 
theoretically investigated. An alternative representation of the proposed test statistics has also 
been established resulting in a great reduction in computational complexity. It is demonstrated 
through extensive simulations that for various forms of dependence and a wide range of sparsity 
levels, the proposed testing procedure compares quite favourably with several existing multiple 
testing procedures available in the literature in terms of overall misclassification probability. 


1 Introduction 


During the past two decades, multiple hypothesis testing has been one of the most significant areas 
of research, particularly for its wide applicability in the analysis of high throughput data coming 
from various scientific fields. For example, in microarray experiments, one tests thousands of hy¬ 
potheses simultaneously to decide which genes are differentially expressed. These are geners which 
are associated with some biological trait of interest. Many different multiple testing procedures 
have been proposed in the literature so far, mostly with the aim of controlling some overall mea¬ 
sure of type I error at a predetermined level a. The familywise error rate (FWER), defined as 
the probability of making at least one false rejection, is a type I error measure which has been 
known and used for a very long time, with the Bonferroni correction being the most widely used 
FWER contro lling procedure. Among the var ious type I error measures, the false discovery rate 
(FDR), due to Beniamini and Hochberd (1995), has received most attention from researchers in re¬ 
cent times. It is defined as the expected proportion of error neously rejected null hypotheses a mong 
all the hypotheses that are rejected. In their seminal work, Beniamini and HochbergI (1995) show 
that when test statistics corresponding to the true null hypotheses are independent, their method, 
henceforth referred to as the BH method, controls the FDR at a prespecified level a. A re lated FDR 
controlling pro cedure that al so works under independence is in Beniamini and LiiJ ( 1999ll . In an im¬ 
portant paper, StorevI ( 2002 1 introduced an approach where one estimates the FDR corresponding 
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to a fixed critical region and using t his an FDR controlling procedure was proposed subsequently in 
Storey. Taylor and Sieemund ( 2004). An essential part of this appro ach is plugging in an estimate 
of the proportion of true nulls. Storey. Taylor and Siegmund ( 2004ll show that their procedure is 
proyides control oyer FDR at a prespecified leyel under the assumption of independence of the null 
p-yalues. 


The theoretical properties of the procedures rely on the assumption that the underlying test 
statistics are independent. But in practice test statistics are often dependent and it has been 
obseryed in simulations and theoretical investigations that performance of BH may not remain sat¬ 
isfactory in such cases. This issue has received significant attention from the researchers in the last 
decade and a half. Firstly, there have been attempts to prove that BH co ntinues to have the FDR 
contro lling properties under certain form of dependence. For example, iBeniamini and Yekutieli 
( 200lll show that the BH method can be applied with the desired FDR control when the test 
statistics exhibit a form of dependence known as “positive regression dependence”. Secondly, there 
have be en attempts to come up with n ew procedures suitable for the dependent case. We first 
mention Beniamini and Yekutieli ( 200lh . who propose a multiple testing procedure, referred to as 
the Beniamin i-Yekutieli (BY) procedure, which controls the FDR under any form of dependence. 
Sarkan (|2002r) extends their work by showing that some generalized step-up and step-down proce¬ 
dures control the FDR at the desired level under general dependence. Storey. Tavlor and Sieermundl 
(|2004li show that their approach has a desirable property in tha t it estimates FDR conservatively 
in the asymptotic sense under certain forms of weak dependence. Bomano. Shaikh and Woll ( 200811 
consider a bootstrap approach for achieving an asymptotic control over the FDR under depen¬ 
dence. These methods are based on p-values which depend only on the marginal distributi ons of 
the co rresponding test statistics and do not take into account the correlations between them. EfrorJ 
( 2 OO 7 II propose a novel testing procedure which depends on estimatin g a certain disper s ion va riate 


that acts as a measure of the overa ll correlation between the data. iLeek and StorevI (1200811 and 
Friguet. Kloareg and Causeui (2009|) propose multiple testing procedures based on factor model 
approaches. Fan. Han and Gul ( 2012h propose a new multiple testing procedure referred to as the 
Principal Factor Approximation (PFA) approach. They develop a novel testing procedure based on 
a set of dependence adjusted p-values, when the underlying test statistics jointly follow a multivari¬ 
ate normal distribution with an arbitray but known covariance matrix. 


The above approaches study multiple testing under dependence from a frequentist point of view. 
The B ayesian approaches available in the literature in this problem are relatively few. ISun and Cai 
( 2 OO 9 II propose an oracle method combined with an asymptotically optimal data adaptive testing 
procedure within a Bayesian decision theoretic framework. They assume the underlying model 
parame ters to be generated acco r ding t o a homogeneous and irreducible two-state hidden Markov 


chain. iBrown. Lazar and Dattal (|201lll introduce a conditional autoregressive model to account 


for the spatial dependence in the data for an fMRI analysis using a Bayesian approach. One of 
the most simplest and natural approach to model dependent data is to use a m ultivariate norma l 
model. It is probably the most widely used model for capturing dependence. Xie et. ( 2011 1 


propose a Bayesian testing procedure for the two-sided multivariate normal mean problem, assum¬ 
ing a shrort range dependence covariance structure. Their method is based on the central idea 
of approximating the joint oracle statistics by the corresponding marginal statistics arising out of 
a two-component Gaussian mixture model and estimation of these statistics as if the data were 
independent. Their method, though, performs well for the short range depen dence case , is no t 
expected to have similar performance under other forms of dependence. As in IXie et. al.l (1201111 , 
we consider in this article the problem of simultaneous testing of the individual components of a 
multivariate normal mean using a Bayesian approach. The difference in our approach is we develop 
a procedure that is applicable under arbitrary dependence, provided the covariance matrix is known. 


Suppose we observe data X = {Xi, • • • , X^) such that JX|/i, ~ Nmi/J-, X), where fj, = (/ii, • • • , 
is an m X 1 vector of unknown means and E is an m x m positive definite covariance matrix and 
is assumed to be known. Note that we do not impose here any restriction on whether Xi, • • • , Xm 
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are strongly or weakly correlated. We are interested in testing the following: 


Hoi : /ii = 0 against HAi ■ fJ^i ^ 0, for z = 1, • • • , m. 

Since E is assumed to be known, without any loss of generality, one may assume E to be the correla¬ 
tion matrix of the X^’s so that Xi ~ 1) for z = 1, • • • , m. Note that this transformation leaves 

the testing problem unchanged. We consider a Bayesian approach by introducing a set of latent 
variables that determines a two-component point mass mixture prior for each of the ^i’s. Such 
two groups formulation is considered a very natural way of formulating a multip le testing prob ¬ 
lem from a Bayesian vi e w point. See Efron. Tibshirani. Storey and Tusheil (l200l[l. Efron[j200j), 
Scott and Bergeil (l200^. Bogdan. G hosh an d Tokdair ( 2008h . l,7in and Cail ( 2007li ~ Scott and BergM 
(2010), Bosfdan et all (I 2 OI 1 I 1 and IXie et. al.l (|201lll in this context. We develop a new Bayesian 
testing procedure using this two-groups framework, that works in a step-down manner which we 
call the Bayesian Step Down (BSD) procedure. 


The BSD procedure has several desirable properties as a multiple testing procedures under de¬ 
pendence. Firstly, the BSD method fully utilizes the dependence between the test statistics X at 
every stage and applicable under arbitrary known covariance matrices. Most of the existi ng multiple 


testin g procedures, such as the p-value based stepwise methods or Bayesian methods like IXie et. al 


(|2011ll . do not incorporate such information. Secondly, as explained below, unlike the typical p-value 
based stepwise procedures, the BSD procedure has a very important admissibility property, in that 
it is not possible to find another multiple testing procedure such that the individual tests induced by 
that procedure are uniformly better than those induced by the BSD procedure in terms of the risk 
corresponding to the usual 0-1 loss. Thirdly, it is easily implementable and can avoid Markov Chain 
Monte Carlo type computations which can often be very demanding from a computational point 
of view, specially when m is large. It can be applied to all situations when E is known as against 
some of the well known multiple testing procedures which are only valid for some special form of 
dependence, for example, positive regression dependence, among the test statistics. We observe in 
our numerical studies that, for any arbitrary choice of the covariance matrix E and a wide range of 
sparse situations, the Bayes misclassification risk of the proposed BSD method compares favorably 
with several existing methods, with a very high power. We define power is defined as the expected 
proportion of correctly identified signals. Moreover, it gives us a generic multiple testing algorithm 
which is applicable even for non-normal models. 


We observe that there exists a close connection between the BSD method and the Maximum 


Residu al Down (MRD) method, a step-down type method developed bv ICohen. Sackrowitz and Xu 
( 2 OO 9 II for this multiple testing problem. We can show a functional relationship between the BSD 
statistics and the corresponding MRD statistics. This helps us prove an important theoretical jus¬ 
tification concerning the use of the BSD procedure from a frequentist view point. We show that 
when X is generated according to a fixed, but unknown mean vector /x with an arbitrary known 
positive definite covariance matrix, the BSD method possesses a certain convexity property which is 
both necessary and sufficient for the admissibility of a multiple testing procedure. For this purpose, 
we call a multiple testing procedure admissible if each of the induced testing rules is admissible 
with respect to the usual loss func tion for the correspond i ng in d ividual testing problem . In a 
series of important res e arch a rticles, Cohen and Sackrowit^ (20051. Cohen and Sackrowitz ( 2007 1. 


Cohen and Sackrowitzj ( 2008h and Cohen. Kolassa and Sackrowit j ( 2007 1 show that in many cir- 
cusmstances, given a typical p-value based step-up or step-down method, it is always possible to 
construct another multiple testing procedure ha ving a smaller expected number of type I and type II 


errors. Moreover, Cohen and Sackrowit j 


2008fl demonstrate that in the context of a general linear 
regression problem or in a treatment versus control study, for testing point null hypotheses against 
the two-sided alternatives, there exist procedures whose individual tests have smaller expected 
number of type I and type II errors compared to the traditional p-value based stepwise methods. 
Thus the typical p-value based stepwise procedures become inadmissi ble whenever the risk is an 
increasing function of the expected number of type I and type II errors. ICohen. Sackrowitz and Xu 
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( 20091) show that their proposed MRD method possesses such aforesaid admissibility property. Our 
proof of admissibility of the BS D method borrows the basic arc hitechture of the corresponding 
proof for the MRD procedure in Cohen. Sackrowitz and Xu ( 2009ll . But the proof does not follow 
as a direct consequence of that of the MRD procedure. As will be evident later in this paper, 
we need a general technique invo king new arguments, which also makes proofs of some results in 
Cohen. Sackrowitz and Xu ( 2009ll more explicit. Our argument also shows that within the aforesaid 


frequentist framework, any multiple testing procedure based on a set of test statistics which are 
strictly increasing functions of the absolut e values of the correspondi n g MR D test statistics, will 
be admissible, thus extending the result of Cohen. Sackrowitz and Xu ( 2009ll . which is an another 
important contribution of this paper. 


The MRD procedure depends on a set of decreasing sequence of critical constants Ci ^ > 

Cm > 0, choice of which are somewhat ad hoc and vary with S. Performance of the MRD procedure 
thus critically depends on the appropriate choice of Ci ^ ^ Cm, and utmost care needs to be 

taken while deciding over the choice of these C^’s. As opposed to this, we choose the thresholding 
constant 5 used in the BSD method (see Section 2) to be equal to 1. Thus we have an automatic 
default choice of the corresponding significance thresholding constant for the BSD procedure that 
works for all choices of E. It can be shown that rejection of a hypothesis under the BSD method 
is possible only when the hypothesis is also rejected by the MRD test criterion used with a certain 
sequence Ci{X) ^ ^ Cm(X) > 0 of data dependent critical constants. 


Practical implementations of the BSD method requires the knowledge of the proportion of true 
alternatives and the variance of the distribution of the non-null fXi’s along with certain posterior 
probabilities used in the definition of the BSD statistics in Section 2. When this knowledge is 
unavilable, one can use a full Bayes approach by placing hyperpriors on the underlying model pa¬ 
rameters and then implement an MCMC method to estimate each of these quantities. However, 
as already mentioned before that it can be computationally very expensive, specially when the to¬ 
tal number of null hypotheses (m) under consideration is large. So we prefer an empirical Bayes 
approach where one needs to estimate each of these underlying model parameters from the data 
and plug them into the BSD procedure. Estimation of the theoretical proportion of true alterna¬ 
tives has been so far an important problem in the d omian of mu l tiple testing. Several method s 
has been p r oposed in th e litera t ure. See, for exam p le, Efronl (l2004ll. Meinshausen and Rice Jg 00 dll . 


Jin ( 2006fl . Jin and (l2007^. Cai. Jin and Low ( 2007 1. Jin (2006) and Cai and JinI ( 2010h and 


references therein. Cai and JinI (l201(lll propose an estimator that is both consistent and attains 
the corresponding minimax optimal rate of convergence. Their method, however, is proposed when 
the test statistics are i.i.d observations from a mixture of Gaussian distributions. In this article, 
we consider their estimator and show that it is consistent in a broad range of weak dependent 
situations. For estimating the remaining parameter, we consider a natural moment based estima¬ 
tor of the non-null variance which we show to be consistent under weak and some stronger form of 
dependence, given that we already have a consistent estimator of the proportion of true alternatives. 


We conclude the introduction by explaining a very useful contribution of this work from a com¬ 
putational point of view. Both the BSD method (or its empirical version as explained above) and 
the MRD method, require inversion of (m — t + 1) many submatrices of E and {m — t + 1) many ratios 
of determinants at each step t, t = 1, • • • ,m, which may be computationally very costly, specially, 
when m is large. We derive an important alternative representation of the BSD test statistics due 
to which at each step we now need to compute the inverse of one submatrix only and can also avoid 
the need for computing ratios of (m — t -|- 1) many determinants as mentioned already. This reduces 
the overall computational complexity of the BSD method by a large extent. Such a representation 
works for any form of the covariance matrix E. It not only enables the BSD procedure to become 
computationally much faster compared to that in its original form, but does the same for the MRD 
method also. This is would be particularly very useful when E corresponds to an intraclass corre¬ 
lation and a block (clumpy) dependence matrix. In particular, for the intraclass correlation model, 
we do not even require inversion of any matrix and thus the BSD method can be applied for any 
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arbitrarily large m. 

The organisation of the paper is as follows. Section 2 gives our prior specification and the 
motivation and description of the proposed Bayesian Step Down procedure. Section 3 shows the 
connection between the BSD procedure and the MRD method and the admissibility property pos¬ 
sessed by the BSD method. Consistent estimation of the proportion of true alternatives and the 
variance of the non-null distribution of the ^i’s are shown in Section 4. Equivalent representation 
of the BSD statistics and associated results are given in Section 5. Performance of the BSD method 
based on simulation studies will be demonstrated in Section 6. Proofs of most of the theoretical 
results are given in Appendix (Section 7) followed by some concluding remarks in Section 8. 


2 Statistical Model and The Bayesian Step Down Method 

For each i = 1, • • • , m, let us define an indicator variable Vi which takes the value 1 if Hai is true and 
0 if HAi is false. Here vi, - ■ ■ , Vm are unobservable. It is assumed that Vi Bernoulli(p) for some 
p G (0,1). The parameter p is often interpreted as the theoretical proportion of true alternatives. 
Given i/i = 0, pi is assumed to follow the distribution degenerated at the point 0, while it is 
assumed to follow some absolutely continuous density g{p) given Vi = 1. Thus piS are considered 
to be modelled as independent and identically distributed (i.i.d.) random observations coming from 
the following two-component mixture distribution, often referred to as a spike-and-slab prior : 

Pi * (1 - P) ■ <5{o} + P ■ gip), i = 1, • • • , TO. (2.1) 

so that the common marginal distributions of A^’s are given by the following Gaussian mixing 
density, 

Ai ~ (1 -p) • /o(x) -\-p- fi{x), z = 1, • • • ,TO. (2.2) 

where /o = </> and fi{x) = 4 >{x — p)g{p)dp is the convolution of g with <()(•), where 4>{-) denotes 

the standard normal density over R. 

As mentioned already in the Introduction that the above two groups formulation in (12.2|) is con¬ 
sidered to be a natural way for formulating a problem of this kind. In a large number of practical 
problems, one can model the data through a mixture of Gaussian distributions such as in (12.211 
above. Moreover, it is now well known that the set of all Gaussian mixing densities is dense in the 
set of all density functions under the Li metric. 

Usually g is assumed to be an absolutely continuous density over R with a flat tail. A natural 
choice for g is an univariate normal density with some large variance. So, let us assume g to be the 
density corresponding to a A(0, V) distribution, where V is assumed to be large. In such a case, 
the marginal joint distribution of A = (Ai, • • • , A^) is given by Theorem 2.1 below. 

Theorem 2.1. Under the above model assumptions, the conditional distribution of X = (Ai, • • • , A^) 
given v = {vi, ■ ■ ■ ,Vm) is given by, 


A|iy~ A„(0,E + UH,) (2.3) 

and the marginal joint distribution of X = (Ai, • • • , A^) is given by, 

X^ Y, ^(«^)A^(0,E + UB,) (2.4) 

where IT(u) = Yl^iP'^'i^—p)^~'^' denotes the prior distribution of [vi, ■■ ■ ,Vm) and B^, is a diagonal 

matrix with diagonal elements ui, - ■ ■ , Vm respectively. 

Proof. See Appendix. ■ 
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Remark 2.1. Note that the distributions given in i2.S\) and are both conditional on the model 
parameters p and V. 

Our present multiple testing problem is now equivalent to simultaneously testing the following: 

Hoi ■ = 0 against Hai : Vi = for i = 1, • • • , m. (2-5) 


This multiple testing problem can also be considered a model selection problem where the aim is to 
choose the most plausible configuration of the i^i’s amongst the set of all 2"* many possible choices 
of the Vi's. From a Bayesian view point, the highest posterior probability model given by 


argmax Tr{{vi,-■ ■ ,Vm)\X) 


is the Bayes rule with respect to the 0-1 loss. If one wants to minimize the risk with respect to the 
total misclassification loss, the optimal Bayesian solution is given by the median probability model 
where 

V* = I{'7r{vi = 1|X) > 0.5}, i = 1, • • • ,m. 

But finding the highest posterior probability model in this situation requires an exploration over all 
2™ many possible configurations of {vi,--- ,Vm)- Similarly, under the present set-up, finding the 
median probability model requires the evaluation of the posterior inclusion probability of the i-th 
alternative hypothesis Hai is given by. 




where 7r((j^i, • • • , Vm)) and f{x\{vi, • • • , Vm)) denote the prior distribution of (z^i, • • • , Vm) and the 
density corresponding to the conditional distribution of - respectively. It there¬ 

fore becomes evident from (12.61) that one needs to sum over 2"^ many terms in order to obtain 
7 r(zzj = ll.X’), for each i = 1, ■ ■ ■ , m, resulting in an overall computational complexity of an order of 
0(7712™). Clearly, by taking either of these two aforesaid approaches, one faces a daunting computa¬ 
tional task even when the conditional density f(x\(v ^, ■ ■ • , Vrn.) is completely known and the numebr 
of hypotheses m is moderately large. Chen and Sarkail ( 2004ll propose a novel Bayesian step down 
method based on a set of conditional Bayes factors. Their method, though originally proposed and 
implemented when X^’s are conditionally independent, is applicable in more general situations also. 
However, their method, though a natural Bayesian analogue to frequentist-type step-down proce¬ 
dures, crucially hinges upon an initial ordering of the null hypotheses using the marginal Bayes 
factors. For the present multiple testing problem, each of these marginal Bayes factors has an one- 
to-one correspondence to the posterior probability of the corresponding null hypothesis H^i being 
true or equivalently with 7r(i/i = I|iy) in (12.6|) . resulting in the same computational issue as with 
the median probability model. 


Recall that in a step-down method, we try to answer the following question at the first step: 
“Can at least one null hypothesis be rejected?” which is equivalent to answering the question “Can 
the global null hypothesis be true?”. A natural approach to answer this question is as follows. 
To compare the global null hypothesis, we confine our attention to the sub-space {(jzi, • • • , Vm) £ 
{0,1}™ : = 1} of ffio original model space, as the class of plausible alternatives to the global 

null. That is, we are considering only those models as plausible alternatives to the global null, each 
of which is a permutation of (m — I) many true null and exactly one false null hypothesis. For each 
of these models in this restricted sub-space, we enumerate the ratio of the posterior probability of 
an alternative model being true to that of the global null hypothesis. If the maximum of these ratios 
exceeds some pre-specified threshold, say, <5 > 0, we conclude that the global null hypothesis cannot 
be true and is rejected along with the null hypothesis corresponding to the coordinate at which 
this maximum occurs and remove the corresponding Xi and move on to the next stage. Otherwise, 
we accept all of them and stop. We continue in this fashion till an acceptance occurs or we are 
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exhausted with all the null hypotheses. 


We now formally introduce the proposed Bayesian step-down procedure as follows: 

We adopt similar convention of notations as used in Cohen. Sackrowitz and Xu ( 2009ll . Let 
an (to — f) X 1 vector consisting of those components of X = (Xi,-- - ,Xm) with 
(Xi^, ■ ■ ■ ,Xij) left out. Suppose ... is the (to — t)x (to — t) submatrix obtained after elimi¬ 
nating the ii, • • • , it-th rows and the corresponding columns of S. Let be the (to —t—l)xl 

vector obtained by eliminating the ii, - ■ ■ , ij-th and j-th elements of the j-th column vector of S. 
Further suppose that 


M) 




(il,--- ,it) 


Let us define 


-f(nr" M-i) 


(X) = 


r{i3j = = 0|X(*i’-’*‘-i)) 


for t, j = 1, ■ ■ ■ , TO, 1 ^ *1 it-i ^ rn and ii yf j for all I = 1, 

For t = 1, • • • , TO, let us define the indices jt{X) as. 


,t- 1. 


jt(X)= argmax 




(2.7) 


( 2 . 8 ) 


Given a pre-determined threshold (5 > 0, the proposed Bayesian Step Down (BSD) procedure is 
described below: 

1. At stage 1, let us consider the functions j e {I,--- ,to}. If S'ijj(x)(X) ^ 5, we stop 

and accept all HoiS. Otherwise we reject i7oyi(x) a-nd continue to stage 2. 

2. At stage 2, we consider the functions S 2 j^^\x), J G {1, • • • , TO}\{ji(X)}. If S'2^^j(X) ^ 6, 
we stop and accept all the remaining Hoi’s. Otherwise we reject F7oj2(x) and continue to the 
next stage. 

3. In general, at stage t, we consider the (to— t+1) many functions [X), where j € 

{I,--- ,to} \ {ji(X),--- ,jt_i(X)}. If < S, we stop and accept all the 

remaining Hgi’s. Otherwise we reject i7oit(x) and move on to the stage {t + I). 

4. We continue in this way till one hypothesis is accepted or there are no more null hypothesis 
to be tested (that is t = to), in which case we must stop. 

Here the subsript t denotes the stage of the BSD procedure. The above description defines a 
class of Bayesian testing procedures for various choices of (5 > 0. In this paper we will work with 
the choice of 1. 


Observe that the statistic S, 


(n.-" ,H-i) 

tj 


^gl.-.U-l)(X) = 


X) may alternatively be written as 

Tt{iXj = ,U-U3) = 

r(z/j = 0, = o,X(*i’-’*‘-i)) 


for t, j = I, • • • , TO, I ^ *1 y^ • • • y^ it-i ^ m and ii yf j for alH = 1, • • • , t — 1. Thus the BSD test 
statistics above at stage t may be interpreted as the ratio of posterior probabilities of HAj being 
true and iLoj being true given that that the rest of the null hypotheses still under consideration at 
stage t are true. 
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3 A Decision Theoretic Justification to the BSD Procedure 


In this section, we shall give an important decision theoretic justification for the BSD method from a 
fequentist view point. In particular, we show that the proposed BSD method based on the statistics 
S'tj’s in (EH), will be admissible when X is assumed to follow a multivariate normal distribution 
with a fixed, but unknown mean vector and a known covariance matrix E. Note that any multiple 
testing procedure $(a;) = {(j)i{x),- ■ ■ , (j)m{x)) induces an individual test function for testing 

H^i against HAi, where <pj{x) denotes the probability of rejecting the z-th null hypothesis H^i when 
the data point X = a; is observed. The risk function corresponding to (pj is 

R,{UX),fJ-) = (1 - /{m, = ^ 0}i?^,^,^o(<^,(X)) (3.1) 

We consider the risk function for the procedure ‘h(X) to be defined as the vector risk function 

i?(d>(X),/a) = (3.2) 

A multiple testing procedure ‘h(A') is said to be inadmissible with respect to the vector risk func¬ 
tion (|3.2D if there exists another multiple testing procedure <I’*(X) such that Rj{^(j)*{X), < 

Rj{pj{X),fj.) for all j = !,■■■ , m, with strict inequality holding for at least one j and some 
fj, G R™. A multiple testing procedure will be admissible if it is not inadmissible. 

As mentioned in the introduction, in many frequently occuring situations, the typical p-value 
based stepwise multiple testing procedures, are inadmissible with respect to the above vector risk 
function (13.21) and given such a stepwise testing procedure, it is always possible to construct other 
multiple testing procedures whose individual tests have smaller risk compared to those for the 
given procedure. Moreover, an inadmissible procedure with respect to the vector risk function 
dSEl) will necessarily become inadmissible whenever the overall risk is an increasing function of 
the expected number of type I and type II errors, such as, when the risk is the expected number 
of misclassified hypotheses. One would certainly not expect a multiple testing procedure to be 
inadmissible. However, as we shall see later in this section, that, our proposed BSD method possesses 
such desired admissibility property. 


3.1 Connection to the MRD method 


Before we move further, we shall first establish an important connection be tween the proposed BSD 
proced ure and the Maximum Residual Down (MRD) method, introduced bv ICohen. Sackrowitz and Xu 
( 2009i) . by showing that there exists a functional relationship between the proposed BSD statistics 
with those of the MRD statistics. This result would be essential for showing that our proposed 
multiple testing procedure based on the BSD s tatistics will be admissible f or th e present testing 
problem. Recall that, the MRD method due to Cohen. Sackrowitz and xi] ( 2009h is based on the 
set of adaptively formed residuals defined as: 




(ii,--- , 


X) = {X,- 


(ii.- 

U) 






X 




}A 


j-ih,--- ,H-i) 


for t, j = 1, • • • , m, 1 ^ *1 ^ it-i ^ m and ii ^ j for all I = 1, 

define the index jt{X) as 


— 1. For 1 < t < m, we 


j'iX)= argmax 


(3.3) 


Given a set of positive constants Ci > C 2 > ■ ■ ■ > Cm, the MRD method works in a step-down 
manner as follows: 

1. At stage 1, let us consider the functions \Uij{X)\, j G {I,-- - ,m}. If |C/ij'(x)(A')| ^ Ci, we 
stop and accept all Hq^s. Otherwise we reject iJoji(x) a-nd continue to stage 2. 
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2. At stage 2, we consider the functions I I; j ^ {I,-- - ,m}\{j^(X)}. If < 

C 2 , we stop and accept all the remaining i?oi’s- Otherwise we reject i?oj^(x) and continue to 
the next stage. 

3. In general, at stage t, we consider the (m — t + 1) functions where j £ 

{I,--- ,m}\{j[{X), - ■ ■ If I ^ Cj, we stop and accept all the 

remaining i?oi’s. Otherwise we reject and move on to the stage {t + 1). 

4. We continue in this fashion until an acceptance occurs or there are no more null hypothesis 
to be tested, in which case we must stop. 

Remark 3.1. Note that the indices jt{X) and j[{X) defined in \2.8\) and i3.3\) . respectively, need 
not necessarily he the same. 


The next theore m characterises the relationship be tween the proposed BSD method and the 
MRD method due to Cohen. Sackrowitz and (I2nn9li . 


Theorem 3.1. Under the present set-up, the BSD statistics and the MRD statistics are associated 
through the following functional relationship: 




(X) = 


p 


1-p 

X exp 


I ,U-l) 

V 


2(V + 

Proof. Observe that one can write each test statistic Stj as 




(il,--- 




(3.4) 




Tr{iyj =0,ixiii.-.H-i,3) = = 0 ) 

P = l,zybi.-.u-ij) =0) 


1-p f{X 


{il,— _ 


= 0 , = 0 ) 


(3.5) 


Let us write 


= ^{ii,...,i,_i)+Vdiagii^j = 0^i3i^ir-,H-i,3) = 0) and 

= ^{ii,...,i,_i)+Vdtagiiy, = 1 , = 0 ). 

It is important to note that for any t = 1, ■ ■ ■ ,m and for any j = I, • • • , m, where ii ^ j for all 1, 
we have the following: 



•spih," 

■ ,it-l) _ 

(S(u.. 


-\-Vdiag{vj = 





= 


■■ 4t-i). 

\- 3 - 3 ) 





= 


■ M,3) 



(3.6) 

and 










■ ,it-l) _ 

(S(n.. 

■■ ,it-l) 

+ Vdiag{vj = 1, 





= 

(S(n.. 

■■ 4t-i). 

\- 3 - 3 ) 





= 


■ ,H,3) 



(3.7) 

which implies that 

for any t = 


, m and for any j = 1, • • • , m, 

with il fi- j for all 1, 

we have 


,U—1) 

^o.j - ^1,3 


(3.8) 
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where denotes the submatrix of a matrx A obtained after removing its row and 

column. Now, |i/j = 0, = 0) corresponds to the probability density 

function of a iV(0,SQ*J’ distribution. Therefore using equation (2.4) one can write, 

= 0 , = 0 ) = 

X (3.9) 

where Mj.(ii... ,it_i)(X) = ^ |-]^g term on the right hand 

side of equation (2.7) denote the probability densities of the corresponding normal distributions 
evaluated at appropriate points. 

In a similar way one can write the following: 

=0) = fV(u,.....i,_,)(X),l/ +a,.,_,))(X,) 

X fV(0,E(,,,..,,„,))(X(*--’*‘--^-)) (3.10) 

Euqations (3.1) and (3.2) coupled with equations (3.5) and (3.6), complete the proof of the present 
theorem. ■ 


3.2 Admissibility of the BSD Procedure 


In this subsection, we show that the proposed BSD method would become admissible when X is 
assumed to follow a multivariate nor mal distribution with a fixed , but u nknown mean vector fj. and 


a known covariance matrix S . As in Cohen. Sackrowitz and Xu ( 2009ll . we shall use a result due 


to iMatthes and TruaxI (|l967t which gives a necessary and sufficient condition for the admissibility 
of a test of Hqi vs Hai when the joint distribution of X belongs to an exponential family. We 
emphasis in this context that although the BSD test statistics can be expressed as functions of 
the corresponding MRD statistics, admissibility of the BSD method does not follow as a direct 
co nsequence of that of the MRP pr ocedure. We adopt the broad architecture of the proof due 
to ICohen. Sackrowitz and Xu (200^. However, it should be carefully noted that for each t, the 
functional relationship between the proposed BSD statistics and the corresponding MRD statistics 
in (|3.4I) involves the terms ,jt-i{x)) which depend on the set of indices ji{x), ■ ■ ■ 

Each of these indices is a function of the observed vector x and they indicate 

the null hypotheses those have already been rejected before the t-th stage. It therefore becomes 
necessary to understand certain behaviour of these terms as a function of x in the 

decision making process when the data x is observed. Such behaviour would be extremely crucial 
for proving the admissibility of the proposed BSD method as we shall see later in this section. In 
this process, we establish that any step down procedure based on a set of statistics Stj, which are 
strictly ii icreasing functions of the copespo nding \Utj\'s, would be admissible, thus extending the 
results of ICohen. Sackrowitz and Xu ( 2009ll for the present multiple testing problem. 


Let 4>j{x) denotes the test function induced by the BSD procedure for testing Hm v s Haj when 


we observe the data point x. The following result is due to IMatthes and TruaxI (1196711 which pro¬ 
vides a necessary and sufficient condition for the admissibility of a testing procedure for testing iLgi 
vs Hai when E is known. 

Let Y = E-^X. 

Lemma 3.1. A necessary and sujficient condition for a test 4>{y) of Hqi vs Hai to be admissible 
is that, for almost every fixed 2 / 2 , • ‘ ‘ ,2/m; the acceptance region of the test is an interval in yi. 


Proof. See IMatthes and Tru^ ( 1967 ). 
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Note that, for fixed (j/ 2 , • • • ,ym), to study the test function ^(y) = </>i(ic) as yi varies, consider 
sample points x + rg where g is the first column of E and r varies. This is true, since y is a function 
of X, and so y evaluated at a; + rg is E“^(a; + rg) = y + (r, 0, • • • , 0) = (yi + r, 2 / 2 , •' ‘ j Vm)- 

Lemma 3.2. The functions Utj as given in equation have the following properties. 


For t€ {!,■■■ , m} and for ji, • • • ,jt-i € {2, • • • , m} with j* ^ je for i ^ i', 


U, 




tl 


(x + rg) = = 


For t G {!,■■■ ,m} and for j e {2, • • • ,m}\ {ji, • • • 1 5***5 7^1}, 

17g^’-’^‘-)(a; + rg) = C/g--’^‘-)(a:) 

Proof. See the proof of Lemma (3.2) of ICohen. Sackrowitz and Xul ( 2009ll . 
Corollary 3.1. For any r G R, we have 

Uij{x + rg) = Uij{x) for all j G {2,--- ,m}, 
which, in turn, implies the following: 


Sij{x + rg) = Sij{x) for all j G {2,--- ,m}. 


Remark 3.2. Since ... > 0, it follows from Lemma 5.2 that for any fixed x G R'” and 

given any (ji,--- \ + rg) \ initially decreases with r and then increases as r 

increases. Also for each fixed x G R™ and given any {ji, ■ ■ ■ ’^^~^\x + rg) is a strictly 

increasing function of r. Therefore, when\ ’^*~^\x + rg) \ decreases in r, U^(^’ ’^*~^\x + rg) 

is negative, while when \ U^(^’ ’^*~^\x + rg) \ is increasing in r, ’^*~^\x + rg) is positive. 


Remark 3.3. In Lemma 3.2 o ACohen. Sackrowitz and XA 1 200 fj) . the term ... was dropped, 
most likely, due to some typographical error. However, the presence of this term require some ex¬ 
tra attention in our situation. As already mentioned in the beginning of this section that the term 
,jt_i(a;)) depends on a set of indices ji{x), ■■ ■ ,jt-i(x), each of which is a function of the 
observed data vector x. It, therefore, becomes necessary to know how this term 
behaves as x varies. This will become evident through Lemma 3.3 given below. 


Suppose <j)i(x*) = 0, when x* is observed, that is, x* is an acceptance point of iLoi- Then the 
process must stop before Roi gets rejected. Suppose the testing procedure stops at some stage t 
without rejecting iJoi- Let a;* +rog be a point of rejection of Hqi, that is, ())i(a;* +rog) = 1. Let the 
testing procedure reject iJoi at some stage to when x* + r^g is observed. The next lemma establish 
an important identity between the set of indicies ji{x* + r^g) and ji{x*), 1 < / < to ~ 1; which 
shows that these indices will remain invariant when min{t,to} > 1- 

Lemma 3.3. Under the conditions 4>i{x*) = 0 and (j)i{x* -Grog) = 1 the following holds when t > 1 
and to > 1' 

jiix* + rog) = ji{x*) for all I = !,■■■ ,to - 1. 


Proof. See Appendix. 


Lemma 3.3, coupled with Corollary 3.1, leads us to the following important result on the relation 
between to and t defined before. 

Lemma 3.4. Under the conditions (f>i{x*) = 0 and (f>i{x* + rgg) = 1, the BSD procedure must 
reject iLoi within t steps when x* + rog is observed, that is to ^ t, where to and t are defined as 
before. 
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Proof. See Appendix. 


Lemma 3.5. Suppose that for some x* and tq > 0, 4>i{x*) = 0 and 4>i{x* + r^g) = 1. Then 
4>i{x* + rog) = 1 for all r > tq. 

Proof. See Appendix. ■ 

Theorem 3.2. For the given two sided multiple testing problem, the BSD procedure based on the 
statistics Stj’s is admissible. 

Proof. Recall that for the present testing problem admissiblity of a multiple testing procedure 
implies that it would be admissible for each individual testing problem. Now using Lemma 3.1 and 
Lemma 3.5, it follows that for testing i/oi vs Hai the test induced by the BSD method, 

would be admissible. Proof that the other tests induced by the BSD method for the remaining 
individual testing problems will be admissible would follow analogusly. ■ 

Remark 3.4. A careful inspection of the proof of Lemma 3.5 in Appendix reveals that one does not 
need the functional form of the statistics Stj’s for proving the aforesaid convexity property stated in 
that lemma. What was only needed there was the fact that the Stj’s are non-decreasing functions of 
the corresponding \Utj\ ’s. Thi s observation leads us to the follow ing theorem from which admissibility 
of the MRD procedure due to \Cohen. Sackrowitz and XA l200!^ ) follows immediately. 

Theorem 3.3. For the present two sided multiple testing problem, any multiple testing procedure 
based on a set of statistics Stj, where the Stj ’s are non-decreasing functions of the absolute values of 
the corresponding MRD statistics \Utj\ ’s, will be admissible with respect to the vector risk function. 


4 Estimation of the proportion of non-nulls p and the vari¬ 
ance V of the non-zero means 

As already mentioned in the introduction that the BSD method proposed in Section 2, involves quan¬ 
tities like the proportion of true alternatives p, the variance V of the distribution of the non-null 
gfs and the posterior probabilities of the form TT{i/j = * = 0,1, 

t,j = I,-- - ,771, 1 ^ ■■■ jt-i ^ ni with ji ^ j for all I, which are usually not known in 

practice and are required to be estimated from the data. One natural approach of estimating them 
is to use appropriate hyperpriors for p and V and then finding their full Bayes estimates through 
employing an appropriate MCMC algorithm. Moreover, using the full Bayes estimates of p and 
V one can obtain an empirical Bayes version of the BSD procedure by plugging those full Bayes 
estimates into the functional relation dsa of Theorem 3.1. However, finding an efficient MCMC 
algorithm that works well in large dimensions may not be an easy as traditional Gibbs sampling 
and Metropolis-Hastings algorithms are known to break down in such situations. On the other 
hand, using (13.41) of Theorem 3.1, it follows that if we can obtain some good estimates p and V 
from the data otherwise, then by plugging those estimates into (13.41) . one can directly enumerate 
the BSD statistics. In this paper, we prefer using the latter approach and thus avoid the need for 
MCMC-type computations. 


We now turn our attention to the estimation of p and V. Consistent estimation of the theoretical 
proportion of true alternatives has been so far one of the most important and challenging tasks in 
multiple testing and related inferential problems. Recall that the classical Bonferroni procedure 
and the BH method are both conservative by a factor of (1 — p) at their respective FWER and 
FDR significance levels. Hence by exploiting information through an estimate p of the theoretical 
proportion of true alternatives p and replacing a/m by a/{m(l — p)) in the corresponding critical 
constants, one can enhance the power of these multiple testing procedure substantially. Moreover, 
there are situations where one may be m ore i nteres ted i n th e prop ortion of truly active signals, 
rather than detecting the true signals. See l.TinI (200 dl and I.Tinl ( 2008ll in this context. A number of 
efforts have been made towards estimation of this proportion. Example of some early works includes 
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Benjamini and Hochberg (2000), Efron at al (2001), Storey (2002), Storey et al (2004) and Genovese 
and Wasserman (2004). Performances of these methods are, however, limited to situ ations when p 
is very small and they are found to be inconsistent, in general. In a pioneering work. iMronI ( 2004 1 
considers a Gaussian mixture model approach as in Section 2 and proposes a natural estimate of the 
proportion of true nulls (1 — p) based on the empirical distribution of the test statistics X^’s. His 
method particularly works well when p < 0.1. Efron’s estiamte, however, is found to be inconsistent 
in general and tend to und e restim ate p, specially when p is moderately large. In an important arti¬ 
cle, iMelaihause^aniRira (2006) propose a 100(1 — a) percent lower confidence bound for p based 
on the empirical distribution of the underlying p-values. Their method, however, is conservative 
and inconsistent, in general. 


A major theoretical breakthrough in this direction has been made in Jin ( 2006h . He proposes 
an estimator of p by exploiting certain concepts from Fourier analysis when the underlying test 
statistics are independent and identically distributed according to a Gaussian mixture model as in 
Section 2. His proposed estimator is based on the central idea of approximating, what he called the 
underlying characteristic function, by the corresponding empirical characteristic function when the 
null parameter values are identical or homogeneous and is shown to be uniformly consistent over 
a large param eter space. De tailed discussion on the constr uction of such estim ators can be found 
in ijini (l2006ll and I JinI (|2008h . In another important naner. Ijin and G^ ( 200lt) extend these works 
by consitently estimating the null parameters values along with the proportion of non-nulls in case 
the null dsitributions are assumed to be unknown and the null parameters are heterogeneous. Their 
estimators are shown to be consistent over a large parameter space and also in situations when the 
test statistics exhibit a-mixing and short range dependence. These es timators, though c onsistent, 
fail to attain any optimal rate of convergence. In a more recent work, Gai and JinI ( 20inll consider 
the problem of finding consistent estimators of the null parameters and the proportion of non-nulls 
which attain the corresponding minimax optimal rates of convergence under an i.i.d. Gaussian 
mixture framework. For any fixed 7 € (0,1/2), they propose the following estimate of p, given by. 


^ m 

pil) = 1-IZ7 X! (\/27logTOJ^j) 


(4.1) 


i=i 


and show that the above estimator of p attains the corresponding minimax rate in situations when 
the parameter p is not too small compared to the to tal number of tests m and ‘‘‘‘vanishes asymp¬ 
totically” as m grows to infinity. I Gai and JinI ( 201(lll conjecture that the above estimator of p in 
(ED will remain consistent under certain forms of weakly correlated structures. In this paper, we 
use this estimator p{'y ), defined in (14.111 above, for estim ating the proportion p of non-null hypothe¬ 
ses. We show that the conjecture of ICai and JinI ( 201(J) is indeed affirmative in the sense that their 
estimator remains consistent under certain weakly correlated structures, such as, finite block depen¬ 
dence, short range dependence, certain intraclass correlation model where the common correlation 
coefficient goes to zero at an appropriate rate as the number of tests grows to infinity, and also in 
situations when p is moderately sparse. It should be emphasised in this context that estimation of 
the theoretical proportion of true alternatives p under stronger form of dependence is indeed a very 
difficult problem to solve and is beyond the scope of the present study. We hope to address this 
problem somewhere else in future. We further assume that the variance parameter V = Vm of the 
alternative distribution of the pfs varies with m. It is natural to assume Vm to be large when m 
is large so that large signals can occur with a positive probability and thus can be detected easily. 
So, Vm is assumed to go to infinity at an appropriate rate as m grows to infinity. We consider the 
following asymptotic framework under which the aforesaid consistency results hold: 

Assumption 4.1. 1. p = pm —^0 as m ^ 00 (asymptotically vanishing sparsity) 


2. V = Vm. —>■ 


00 as m ^ CO 


3. PmVm = 0(1) as m ^ CO 


13 















































The following theorem shows that under Assumption (4.1), ^( 7 ) in (14.11) consistently estimates 
the proportion pm of true alternatives for certain weakly correlated structure E. 


Theorem 4.1. Consider the mixture model in Theorem 2.1, where E = {{(Xjj')) denotes the 

correlation matrix associated with the random vector X. Let ^(y) be an estimator of p = Pm os 
defined in Then under Assumption (4-1), i/0 < 7 < 1/2 be such that nrA —>• 00 as 

TO —> 00 , and 


lim — 

Tfl 


1 


2p2 


E 





(4.2) 


then the following holds: 


P{l) Pr 

P 


as m ^ 00 , 


(4.3) 


where the above probability convergence is taken with respect to the joint distribution of Xi,X 2 , • • • , 
given by !i^-4\ )- 

Proof. See Appendix. ■ 


We now turn our attention for estimation of the variance V = Vm of the alternative distribution 
of Pi's. For that we observe that since E is assumed to be a correlation matrix, using Theorem 2.1, 
the common marginal distribution of the A^’s is given by. 


W ^ (1 - p)A(0,1) + pA(0,1 + y) 


whence we have. 


E{Xf) = (1 — p) + p(l + !/) = !+ pV, for each i = 1, - ■ ■ , to. 
Based on (liAll . we consider the following moment-based estimator of V, given by. 



where p is an estimate of the proportion of true non-nulls p. 


(4.4) 


(4.5) 


(4.6) 


The next theorem shows that whenever some consistent estimator of p is available, the above 
estimator V will consistently estimate V under pretty general conditions. 

Theorem 4.2. Consider the mixture model i2.4\j in Theorem 2.1, where E = ((cr^)) denotes the 
correlation matrix associated with the random vector X. Suppose p is a consistent estimator of p. 
Then under Assumption (4-1), if mp^V'^ —>• 00 as to 00 and 


lim 

m—¥oo 


1 

m'^p'^V^ 


E 





= 0 


(4.7) 


then 


V 

V 


1 as TO —>■ 00 , 


where the above probability convergence is taken with respect to the joint distribution of Xi, X 2 , ■ ■ ■ 
in [K7 


Proof. See Appendix. 
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5 Alternative Representation of BSD Statistics 


Observe that in order to perform the BSD procedure based on the staistics Stj ’s as given by equation 
(2), at each stage t we need to find the inverses of m — t + 1 many submatrices of E which can be a 
troublesome issue for the computation of the BSD and the MRD procedure as well, specially when 
m is large. In this section we derive two important algebraic identities that would lead us to an 
alternative representation of the BSD statistics Stj^s resulting in a huge amount of computational 
savings and facilitate the computation of both the procedures to a great extent. 

Lemma 5.1. For any arbitray variance-covariance matrix Ti and for any u £ {0,1}™, we have the 
following identity: 

(E + = (E + . . bi{u)biiuf 

1 + V Uii ) 

where bi{u) denotes the i-th column vector of the matrix (E + Vand bii{u) is the i-th 
element ofti^iy), that is, the i-th diagonal element o/(E + . 

Proof. See Appendix. ■ 


As a consequence of Lemma 5.1, 

iye{o,i}™. 


it immediately follows that, for any x £ i?"* and for any 


£c'^(E + 


a:'^(E + 

V 

x'^bi{v)bi{v) 

1 + Ybuiu) 

a:'^(E + - 

Vhiiu) 


1 + Ybuiu) 



where hji{u) denotes the j-th element of bi{u) already defined in Lemma 5.1. 

Lemma 5.2. For any arbitray positive definite covariance matrix E and for any u £ {0,1}™, we 
have the following identity: 


I E + I 

I E + V I 


1 + Vbu{iy) 


where baiu) denotes the i-th diagonal element of the matrix (E + 

Proof. See Appendix. ■ 


Lemma 5.3. For any arbitray positive definite covariance matrix E and for any u £ {0,1}™, we 
have the following identity: 


hiiiy) = 


O’M ^(E + VBj/ o)(—i, —i)) ^( — i) 


-1 


and 




where bi{i/) is same as in Lemma 5.2 and denotes the vector obtained from biiy) after 

removing its i-th coordinate and (E + _q is the submatrix obtained by removing the 

i-th row and the i-th column of Ti + VBi,jy.^Q. 

Proof. See Appendix. ■ 


Lemma 5.4. For any arbitray positive definite eovariance matrix E and for each i = 1, - ■ ■ ,m we 
have the following: 

f{x\o, = = 0) _ / 1 f ^ 

fixli'i = 0, =0) V 1 + Vbiz ^ I 2(1 + Vb^i) V ^ 

where bi = {bu, • • • , bmiY' denotes the i-th column vector of the precision matrix E“^ 
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Proof. The proof follows immediately combining the results of Lemmas 5.1—5.3 together. ■ 

Theorem 5.1. For each step t of the Bayesian Step Down procedure, the statistics (.qyi 

equivalently be represented as, 


o(u,-" M-i) 
^tj 


{x) 


p{i-pY 


l + Vh 




X exp 


V 


^kj 






2{1 + Vb 


(ii,--- 


where ’** denotes the /3j^ column vector of the matrix ftj being the position of 

Xj among the remaining Xi ’s after having left , • • • , and the summation within the square 

in the exponent of the righthand side being taken over the appropriate set of indices. 

Proof. Proof follows as an immediate consequence of Lemma 5.4. ■ 


An immediate consequence of Theorem 5.1 is that at the t-th stage of the BSD procedure, we 
do not need to find the inverse of each of the (m — t + 1) many submatrices E(jj(a;)_... nor 

do we need to find the {m — t + 1) many ratios of determinants 

17- - -\T 

I ,h-iix)) + V I 

at each step t, for t = I,-- - ,m, which might be a troublesome issue from computational view 
point even for moderately large m. Moreover, while computing the ratio of determinants given in 
equation (5.1) above, the denominator might individually be so small, that the computer might 
report it to be zero, and thus producing erroneous results. This might happen for some E, specially 
when m is large. However, the representation given by Theorem (5.1) above helps us to avoid all 
such computational issues. We only need to compute the inverse of ... whose column 

vectors will be used for computing the (m — t + 1) statistics Thus the overall 

BSD procedure becomes computationally very fast compared to its original formulation. 


6 Simulations 

We shall update the present version of this article very soon after including the simulation results 
in this section. 


7 Appendix 


7.1 Some auxiliary results 

Proposition 7.1. For any n x 1 vector v and any n x n symmetric positive definite matrix A, 

/* 1 1 1 
/ exp{—-w"^A~^w + v"'"w)dw = ( 27 r)^ |A| = exp(-t>^Au). 

Jjjn 2 2 

Proof. See Lemma B.1.1 of Santner. Williams and NoItI (l2nn,3t . 


Proposition 7.2. Suppose that B is any nxn nonsingular matrix, C is arxr non-singular matrix, 
and A is an arbitrary nxr matrix such that {A^ B~^A-\-C)~^ is nonsingular. Then {B-\-A'^C~^A) 
is n X n non-singular with inverse given by, 

[B + A^C-^A)-^ = B-^ - B-^A{A^B-^A + Cy^A^B-^ 


The above result from matrix algebra is popularly known as The Sherman-Morrison-Woodbury 
(SMW) identity. 
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Proof. See Lemma B.3.2 of Santner. Williams and Not3 ( 2003 ). 
Proposition 7.3. Suppose 




0 \ / af pcria2\ 

0)\pcricr2 cr| J 


(7.1) 


Then 


i?[cos (v27logmZi) cos (v27logmZ2)] = - 


exp { — (o'! + cr| + 2 paia 2 )"f\ogm) 

+ exp {-(crj + erf - 2pcri0-2)7 log to} (7-2) 


Proof. Using Euler’s formula we have for all a; G M, 


cos(a;) = i(e“ + e-“). 

where i = '/—I. Therefore, 

E[cos (\/27logTOZi) cos {y^ 2 -/\ogmZ 2 )] = g-*U27k^Zi) 

X ^gV27log"*-Z2 _|_ g-iV27logmZ2^j 
_ }_j^r^i^/2Tlogrn{Zi+Z2) ^iV2jTogm{Zi-Z2) 

4 

_|_ giV27 log m(-Zi+Z2) _j_ giV27logm(-Zi-Z2)j 


(7.3) 


Recall that if 


Z^Nd{e,A) 

then the characteristic function of Z is given by, 


(7.4) 


(fzit) = foj. I g 


Using this we obtain. 


E m(Zi+Z2) j _ g-(o-I+o-2+2po-io-2)7log"t _ ^ |^g V27 log m(-Zi-Z2) j 


(7.5) 


and 


E *°® m(-Zi+Z2) j _ g-(cri+o-^-2pcricr2)7logm _ ^ |-gi727 log m(Zi-Z2) j (7 6) 

whence we have 

1 


E[cos [\/ 2 ^logmZi) cos [y/2^logmZ2)] = 


g-(o’f+o'2+2po-icr2)7logm _j_ ^-(cri+o-j-2po-i (72)7 log r 


This completes the proof of Proposition 7.3. ■ 

7.2 Proof of Theorem 2.1 

Proof. Let us denote the vectors (xi, • • • , Xm), {pi, ■ ■ ■ , Pm) and (i^i, • • • , 12^) by x, p and iz respec¬ 
tively. The corresponding likelihhod function is given by. 


f{x\p,iz) = 


exp{ —l(a: — p)^S ^{x — p)} 
(27r)W2|E|i/2 
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and the prior distribution oi u = (vi, ■ ■ ■ , Vm) given by, 


Therefore, the marginal distribution of X is given by, 


fix) = ^i^)fix\i') 


i/G{0,l}™ 

= J2 ^i^'^ 

./G{ 0 , 1 }'" 

i^G{ 0 ,l}™ 


exp{-i(a;-/x)^S \x - p,)} „ exp(-|^) „ 

„ —Mww—. n _ n^_ 


(27r)™/2|S|i/2 

f exp{-i(/x - a;)'^S-i(/i - a:)} 
'r,., (2 ^)W2|S|1/2 




where li^l = 'YfflLi ■ i^i = ^} and the sum being taken over all possible choices of 

ly € { 0 , 1 }'". 


Fix V G {0,1}'". 

Observe that (/a — — x) can also be written as 

(H- x)'^T,~^{pL- x) = {pL-x)'^P^P^T,~^P^P^{fj,-x) 

= (Pt.(/x - - ®)) 

where Pi, is a permutation matrix such that the first im — It'D components of Pi,{p, — x) are pre¬ 
cisely those components of (fi — x) for which pi = 0 and the remaining \v\ components of Pi,{fj, — x) 
correspond to those components of [p — x) such that pi ^ 0 . 


For notational convenience, let us write Pu{p — x) as, 

\P'2,u - X2,u 


Observe that, Ph, = 0(m-|i/|)xi- Then 

n exp(-l^) 


Therefore, 


(2T:V)W\n 

exp{-iML(r/|j.|)"V2...} 

(27ry)l‘'l/2 


exp{ —7 


fixW) = 


1 ( Mi,i/ Xi i, 
M 2 ,i/ ~ X2^i, 


-1 ( Mi,i/ Xi^i 

M2... - X2,t 


Jrm (2^)'"/2|E|1/2 

exp{-i/x^_^(l//|^|)-V 2 ,..} 


(27ry)l''l/2 


-dp2 
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where is a non-singular matrix such that = Pi^E ■ Since P^, is a permutation matrix, 
and hence orthogonal, we have, 

E^ = (P.E-ipJ)-i 

= (Pj)-i(E-i)-i(P.)-i 
= {p^-Yi:pJ 
= p^^pj 


Consequently, |Et^| = |Pt^EP^^| = |E|. Therefore, 


f{x\u) = f 
Jr 


exp{-i 


A* 2 ,i/ ^2,v 


1 ( Ml,i/ \ y^-1 ( Ml,i/ Xl,v 


A* 2 ,i/ ^2,v 


mIH (27r)W2|s,|i/2 

exp{-i/x^,,(F/|^|)-V 2 ,>.} 


(27rC)l'^l/2 


-dfi 


2,v 


Let us partition the matrix E^, as 


E, = 


2 ^ 11 , 1 ^ ^12,V 

'P‘21,V E 22 . 1 / 


Observe that 



Mi,,^ - xi, 
lJ‘2,u - X2, 



(27r)™/2|s^|i/2 


Ml.iy 3:1 

}^2,i/ ^2,v) 


is nothing but the probability density function of a Nm{(xi^i,,X 2 ,v),'P‘v) distribution, that can be 
written as the product of the probability densities of En i^) and Nm_\ij\{Wi,,Yi 2 \i.u) dis¬ 

tributions, where w^, = X 2 .v - Y‘ 2 i,vP‘ii^u^i,v and E 2 |i,t, = ^ 22 , 1 ^ - 'P‘ 2 i,vP‘ii^i,P‘^ 2 ,v 


Hence 


Therefore, 



Mi,,^ - xi, 
tJ‘2,u - X2, 



(27r)W2|s^|i/2 


Ml.I/ 

}^2,u ^ 2 , 1 ^ / 


exp{-ia;^_^Eij^^_^a;i,^} exp{-i(/i2,^ - ^{p.2.,u “ w^)} 

(27r)(™-IH)/2|s;,^^|i/2 "" (27r)IO/2|E2|i,,|i/2 


f{x\v) = 


X 


(27r)(™-l‘^l)/2|Eii,^|i/2 

Ui (27r)IO/2|s,|^_^|i/2 ^ (27rH)IH/2 


Now, 


(M 2 .^ - V 2 ,^ 

“ f^2,iy (^2|l,i-' “1“ iy^^\iy\'} ')l^2,u ‘^{'^2\l,u H"^211,1-' 

~ f^2,iy -^i'l^2,u 2(^2|1,i-' f^2,iy ^2|l,t/ 
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where = S 2 |i,i/ ^ Therefore, using Proposition 6.1 it follows that, 

exp{ 2 l,f 'Wi,} 


f{x\u) = 


(27r)(™-l‘^l)/2|I]n,>.|i/2 X (27ry)l‘^l/2 x (27r)l'^l/2|I]2|i,^|i/2 
JRI--I ^ 


(27r)("*-kl)/2|E^^ ,^|i /2 X (27rl^)l^l/2 x (27r)l‘^l/2|E2|i,^|i/2 

X |A.- 


X (27r)l‘"l/^ X X exp{i(E2|i,^ ^(S2|i,^ ^w^)} 


~ ('27r)W2t/kl/2|x:^|i/2 ^^2|l,t. 

Next applying Proposition 6.2 we have. 


-1 


" 2 | 1 „ 


— '^2\l,v Ai, ^2\l,iy — ^2|l,i^ ~ ^2|l,i^ i'^2\l,v + {V ) S2|i,iy 

= ^2|l,i/ “ ^2|l,i/ ^^|i/|(-f|^|S2|l,i/ ^/|,^| + (’P/li.l) ^S2|i,i, 

= (^ 2 |l,i/ + ^^|i^|) ^ 

= (^22,1/ — S21 .j/E]^;^^j^Ei2,j/ + ^ 

Again note that |A^“^| = and |Ej^| = |Eii_t^||E 2 |i^,y|. Therefore, 

1 




1 


v/|Eii,.|| 1 "A,||E 2 |i,.| 

1 

\/|Sll,,y||E2|i,,y + l^/|i 


Thus we have, 
f{x\u) = 


exp{- 2 a;J^Eii 

(27r)(™-l‘^l)/2|Eii,^|i/2 

^Xp{ 2 (^ 2 ,^21,1/^11.1/^!,^') (^2|l,i/ T ^.f|i/|) (^2,1/ ^21,1/^11^1/^!,I/)} 


which can be rewritten as 


exp{-i / "=1’^ 


X2,i 


( 2 ^)IH/ 2 |E 2 |i,, + y/|, 


Sll,j/ Si 2 ,i/ 

S 21 . 1 / ^ 22 , 1 / + l^/|j/| 




(2Tr)'^/‘^ 


^ 11 ,V S 12 ,,/ 

^ 21 , 1 / E 22 ,i/ + 


1/2 
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Now, 


\X2,u J \ 5 ] 21 . i / ^22,1^ + VI\^\J \X2^v) 




^ 21 , 1 / ^22,u + VI\ 

T I 1^11,1' ^12,u 


-‘ 21 , 1 ' 


-‘22,1' 


-1 


Xi. 


X^ 




0 0 
0 /|^| 

0 0 ' 


0 /| 


H 


-1 




(^S + FP, 
a;^(E + VBi,)~^x 


rp fO 0 
0 /|^| 




where B^, = (/3j ^, • • • ,/3^ j^) and l3i ^ = Vi^i i = 1 , • • • ,rn and denotes a m x 1 vector whose 
i-th element is 1 and rest are O’s. That is, B^ = diag{vi, - ■ ■ ,Vm)- Again from the proceeding 
paragraph it immediately follows that 


(^ll,v ^12,v 

\E21^,y E22,iy + 


which in turn implies that 


^ 21 , 1 ^ 'B22,u + yi\w\ 


P,(E + yP,)Pj 


= |E + h"P^| 


Hence we finally have 

exp{-^a;^(S + VB^)-^x) 

jyx\v)- ('27r)W2|i] + y5^|i/2 

which means 

X = (Xi,--- ,Xm)\v ^ Nm[0,^ + VB^). 
Hence the marginal distribution of X is given by. 


fix) 


T^Mfi^W) 


E 


_ exp{-ia;'^(S + T^P^) ^x} 

(27r)W2|E + l/P^|i /2 


7.3 Proof of Lemma 3.3 

Proof. First observe that since both t > 1 and to > 1, one must have jiix* + rgg) yf 1 and 
ji{x*) 1. Then using the observation made in Remark 3.3 we obtain, 

ji{x*+rog) = aigmax Sij{x* + rog) 
je{2,--- ,m} 

= argmax Sij{x*) 

je{2,--- ,m} 

= jlix*)- 

Now using Lemma 3.2 it follows that, for all Z = 1, • • • , to ~ 1 with ji{x* + rog) ^ 1, 
j G {I,--- ,m}\{jiix* +rog),--- ,jto-iix* + rog)}, 


(7.7) 
and for all 

(7.8) 
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whence we obtain for alH = 1, • • • ,to — l with ji{x* +rog) ^ 1, and for all j G {1, • • • , m}\{ji{x* + 
rog),--- ,jto-i{x* +rog)}, 


= ^(il (a:*+'-og),'" ,ii-l (a:*+i’og)) ^ 

In particular, for all I < tg, 


c(h(a:*+’'og),'" J!-i(a;*+r-og)) / * , \ _ o(ii(“=*+’'o9).'" jt-i(a:*+»’o9)) / *\ 

‘^ljiix-+rog) 1 ' O"! ‘^lji{x*+rog) 1 ' 


(7.9) 


(7.10) 


Again using Lemma 3.2 we have for all Z = 1, • • • , to ~ 1 with ji{x* + rgg) ^ 1, and for all j G 
{I,-- - ,m}\{ji{x* +rog),--- ,jt-i{x* +rog)} the following: 


Tji.h(.^*+i'og),--- ,h-i{x*+rog)) 


{x* + rog) 


+ 


jjUli^'+rog),--- Jl-li^'+rog)) 

1 

'^l'(ii(“=*+’'09)r" Ji-i(a:*+’'o9))’ 


which means that only the values of +‘^og), ,ji i(x +‘^og ))change for I = 

1, • • • , to — 1- Again since Hqi is rejected at the to-th stage when x* + rog is observed, for each 
Z = I,--- ,to —1, +’'09). +’'o9))('2.*_|_j,^g,^ cannot be the maximum of the corresponding 

+r-o9). +'^og ))since in that case Hoi would have been rejected before the 

to-th step which would be a contradiction. Using this observation and equations (IZ31), (EH) and 
(17.101) it therefore follows that for any 1 < Z < to — 1, 


ji{x* +rog) 


arg max 

,m}\{ji(x*+rog),--- +rog)} 




arg max 

jG{2,... ,m}\{ji(x*+ro9),'" ,ii_i (a:*+i’o9)} 


^(il(a:*+’’09).--- .ii~l(a:*+’'09))^^*^ 


arg max 




This completes the proof of Lemma 3.3. 


7.4 Proof of Lemma 3.4 

Proof. First observe that when to = 1, since 4>i{x*) = 0 and (j>i{x* -|- rog) = 1, one cannot have 
t = 1 due to (EH- Therefore we must have t > 1 when to = 1. Thus the result is true when 
to = 1. However, the proof for the case when both t > 1 and to > 1, is non-trivial and requires a 
contrapositive argument and Lemma 3.3. 


Z and 


Since t > 1, we have (a;*) > 5 for all Z = 1, • • • , t — 1, with ji {x*) ^ 1 for each 




Ait (a;*) V" ^ ^ ‘-'tj 

for all j G {I,-- - ,m} \ {ji(a:*), • • • ,jt-i(a:*)}, withj/(a;*) ^ 1 for all Z G {I,-- - ,t- 1}. 


On contrary, let us now assume that to > t. Then 


c(li(“’*+''o9).'" .it-i(“’*+’'o9)) 
^tjt(x*+rog) 


{x* -f rog) > S, 


otherwise the process would have stopped at stage t without rejecting Hoi when x* + rog is ob¬ 
served, which would be a contradiction. 
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Now by using Lemma 3.2 and a subsequent application of Lemma 3.3 it follows that, 

rr(ii(a5*+r-oa),'" ,it-i(a;*+r-og))/ « n _ jjUiix*+rog).,--- +rog)), 

^tjt(a;*+rog) 1 ' ^tjt(a;*+rog) 1 ' 

and thereby we have 

> i 

This means that when x* is observed, the testing procedure cannot stop at stage t and consequently 
(x*) ^ 0, which is a contradiction. This completes the proof of Lemma 3.4. ■ 


7.5 Proof of Lemma 3.5 

Proof. Let us hrst consider the situation when tg > 1- 


Observe that when tg > 1 we have, 


*^tol V® ) ^ 




qUiix*),--- .Jto-i(a;*)) , 

^tojtoix-) ) 

oUiixn,- yto-i(a,*)) , , , 

^tohoix*) 

iMxJ+rog),- ,,*„_i(.-+.og))^ * ^ 

tojtoix-) '' 

qUi{x*+rog).— ,jt(,-i{x*+rog)) , 

‘^toi 1® + ^oy) 


S, 


-^tol 

Ui{x*),— ,jto-i{x*)) 

to 1 


(®* + rog) 


Since for given ji,--- 'S’ljji’ is a strictly increasing function of | |, it 

follows that 




(®* + rog) 


(7.11) 


whence it follows from correction 3.1 that 


U, 


Ulix'),--- ,jtg-l(x*)) 


tol 


(®* + rog) > 0. 


But for given (ji, • • • ,jto-i), ^\x* + rg) is strictly increasing in r. Hence for all r > rg, 

we have 


U, 


Upx-),— jtQ_i(a;*)) , » 


to 1 


{x* + rg) > 


(il(a:*),--- Jto-l(a:*)) / * 


tol 


(®* + rog) > 0 


(7.12) 


We shall complete the proof now based on a contrapositive argument. Recall that, we need to show 
01 (x* + rg) = 1 for all r > rg. On contrary, suppose this is not true. Then there exists some ri > rg 
such that 0i(x* +rig) = 0. Let H denote the step at which the testing procedure must stop without 
rejecting Hoi when x* + rig is observed. Then using Lemma 3.4 we have tg ^ ti. Since to > 1, 
using Lemma 3.3 it follows 


ji {x* + rig) = ji {x* + rog) 
= jiix*)- 


for all Z = 1, • • • , tg — 1. Again, replacing x* by x* + rig, and applying the preceeding arguments, 
from (17.1111 we obtain 


I TjUpx*),— ,jtQ_i(a;*)) 

I 


(x* + rig) 




ttUPx*),— jtQ-pX*)) 


{x* + rog) I 
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which contradicts (17.121) . Therefore one must have + rg) = 1 for all r > tq, when to > 1- 


Next observe that when to = 1, since Sij{x* + r^g) = Sij{x*) for all j € {2, - , m}, one must 

have S'ii(a:*) < Sii{x* + r^g). Therefore, using exactly the same arguments as before we have, 

Uii{x* + rg) > Uii{x*+rog) > 0 for all r > tq. 


Then for all r > rg we have, 


Sn(x*+rg) > 
> 


Sn(x* + rog) 

Sij(x* + rgg) for all j € {2, ■■■ , m} [since to = l] 
Sij(x*) for all j S {2, • • • , m} [using Corollary 3.1 ] 
Sij{x* + rg) for all j G {2, • • • , m} [using Corollary 3.1] 


which implies that every x* + rg will be a point of rejection for Hgi for all r > rg, that is, 
4>i{x* + rg) = 1 for all r > rg when tg = 1. This completes the proof of Lemma 3.5. ■ 


7.6 Proof of Theorem 4.1 

Proof. We first split ~ 1)^ into two part as 


where 





1 + 




(7.13) 


1 

m2{i-j)p2 


- m 

Car-( cos 

Lj=i 


(\/27 \ogmXj)) 


+ y]] Cow (cos (\/27 log mXj) 
i/i' 


cos {^/2-f log mXj,)) 
(7.14) 


Now observe that, using dZSl), for each j = 1 , • • • , m, we have, 


C [ cos (7/27 log mXj)] = i [E (e*^27 log (5-^27 log mXj ^j 

= £;(eW27iogmX, ^ ^ 

= (1 - p)L;(e*v' 2 Tk^x, 1 ^^. = 0) + p£;(eW 27 iogmx, ^ 

where i = 7 /—!, whence it follows that 

|^(p(7)) -p\ 


(7.15) 


= -|l — m'^E[cos ( 7 / 27 logmXi)] — p\ [since Xi = Xjfor all j] 

= -|l —p — [m“'''{(l — p) I 


Clearly, from Assumption (A) we have. 


lim = 0. 

m—^co p 


(7.16) 


Let us now fix any 1 < j,j' < m with j £=■ j'. Then using Proposition 7.3 we obtain the following: 

A [cos ( 7/27 log mAj) cos ( 7/27 log mAj/)] = EiE 2 EgE^ (7-17) 
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where 


El = (1 — p)^E[cos (\/27 log mXj) cos log mXj/ ) | vj = 0, Vj/ = O] 

= 1(1 _p)2|‘g-2(l+a^-3/)7logm _^g-2(l-ajj,)7logmj (J ig) 

£-2 = p(l —p)£'[cos (\/ 27 log mXj) cos (\/ 27 log mXj/ )| I'j = l,Vj' = O] 

= lp(l _p)|^g-(2+2<Tjj,+V)7logm _^g-(2-2CT^.^.,+y)7logmj ^y^^g) 

E 3 = p(l — 7 ')£'[cos ( 7/27 log mXj) cos ( 7/27 log mXj/ )| I'j = Q,Vj' = l] 

= lp(l _ p) |^g-(2+2<Tjj/+V)7logm g-(2-2<T33,+y)7logmj 

= E 2 (7.20) 


and 


£'4 = ^^^[cos ( 7/27 log mXj) cos ( 7/27 log mXji)|Vj = l,Vj' = l] 

= lp2 |^g-2(l+o-jj/ +V)-i log m _|_ g-2(l-crjj., +y )7 log mj 


(7.21) 


Combining (I7.17I1 - (I7.21I1 and doing some simple algeabraic manipulations thereafter, we obtain, 
£[cOS (v' 27 logmX 4 ) cos ( 7/27 log mX^v)] = 1 [g- 2 (l+a,,/) 7 logm g- 2 (l-a,,,) 7 logmj 

X [(l-p)e-^'°g™+pe-(i+^)T'‘°s™]l 

Therefore for each pair of indices 1 < j, j' < m with j ^ f , we have, 

Coti( cos ( 7/27 log mXj ), cos ( 7/27 logmXj/)) 

= (^-^V2) f ' [(1 - '°«™] 1 

\Trv m ' J J 

Hence by the condition (14.21) in the statement of Theorem 4.1, we have, 

yi Covi^ cos ( 7/27 log mXj), cos ( 7/27 log mXj/)) 


1 ™ 9 9.V 9 

m—^00 777^ 




< lim 


1 


m—700 Tfl^p^ 


E 






= 0 . 


Therefore, from (17.1411 and (17.221) we obtain, 

'p{l) 


lim sup Var 

m—^oo 


— 1 < lim sup 


^ 2(1 — 7)^2 
m—^oo lit' ^ 'P 


Har( cos ( 7 / 27 log mXj)) 


i=i 


(7.22) 


< lim sup 


,^2(1 —7)^2 
m—^00 III' ^ ‘^P 


< lim sup 


^2(1 —7)^2 
m—^00 III' ^ ‘^P 


^£(cos^ ( 7 / 27 log mXj)) 
i=i 

m 

El 

i=i 


= lim sup 

m-i-oo 
= 0 
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whence it follows that 


(7.23) 

Rest of the proof follows immediately by combining equations (I7.13|) . (17.161) and (17.2311 together. ■ 


lim V ar ( 

m—^oo 


7.7 Proof of Theorem 4.2 

Proof. To prove Theorem 4.2, it will be enough to show 

p-v ^ ^TZiXf-{i+pV) 

p ■ V pV 


0 as m —>■ oo. 


(7.24) 


For that we first observe that for each j = 1, • • • , m, E{Xj) = 1 + pV and hence — 1) = 0 

for all m, so that 


E 


p-V 
p ■ V 


— 1 I = Var 


p- V 


kP ■ P 
Var{Xf) 


+ 


mp^V^ mnfp^V^ 


Y, cov{Xf,Xf) 




3{1 - p) + 3p{l + V)^ ^ 1 


mp^V^ 




Y coviXlXf) 


(7.25) 


Next we observe the following: 

cov{Xf, X'^) = 20-2. for all i j, 

under the mixture model (ESI). To prove (I7.25L let us fix any 1 < i,j < m with i ^ j. Observe 
that under (12.311 . the joint distribution of {Xi, Xj)"^ is a four component mixture of bivariate normal 
distributions given by, 

f{xi,Xj) = (1 - p)‘^f{xi,Xj\vi = 0, z/j = 0) + (1 - p)pf{xi,Xj\vi = l,Vj = 0) 

+ p{l - p)f{xi,Xj\vi = 0,i^j = 1) + {I - p)‘^f{xi,Xj\vi = = 1) (7.26) 

where 


Vi = r,Vj = s ^ N 2 


l-\-rV at 


1 + sR 


(7.27) 


for (r, s) G {0,1} X {0,1}. Now applying the law of iterated variance and then using (I7.27|l . we 
obtain, 


EiXtY) = 


E 


p’'(l - pyE(X^ Xj yz=r, Vj = s) 


(r,s)G{0.1}x{0.1} 

= E p'"{l-pyE[X^E{Xf\Xj,v, = r,Vj = s)\v, = r,Vj=s] 

(r,s)G{0.1}x{0.1} 

= (1 ~ P)^ [1 + + 2p(l — p) [(1 + R) + + p^ [(1 + Vy + 2afj] 

= 2CT,{,[(i-p)^ + 2p(i-p)+p2] + [{1-py+ 2p{i-p){i + v)+p'^{i + vy] 

= 2al + {l+pvy 

whence it follows that 

cov{Xf, Xj) = 2a1^ [since E{Xi) = 1 + pV for all i]. 

Then by the given condition of Theorem 4.1, we have, 

1 


lim 

m—^oo 


77j2p2f/2 


^ cov{XlX^) 




= 0. 


(7.28) 


Rest of the proof now follows trivially. 


26 

























7.8 Proof of Lemma 5.1 

Proof. Let us write = S + 1/^=1 and Bo,i^ = S + 1 /^= 0 - Then 

Bi,i, = {T, + VB^,y^=o)+ Vdiag{ei) 

= Bq^i, + V diag{ei) 

= Bo.u + AI-^A^ 

where A = \fVdiag{ei) = . Using Proposition 7.2 it follows that, 

Bfl = {Bo,u + Ar^A^)-^ 

= - B^^lAiA^B^lA + ly^A^B^l 

Let BqI, = {{bij))mxm- Since Bq^ is positive definite, Bffl, is positive definite. So, bu > 0, for all 
i = 1, • • • , m. 


Now observe that, 


A^B^^A 


= V 


/ 0 0 


0 0 

bli &2i 
0 0 


0 \ 


0 


Therefore, 


which implies that 


Vo 0 ••• 0 / 

= Vdiag{0, ■■ ■ ,0, bu, 0, • • • ,0) 

= Vbudiag{ei) 

I + A^B^IA = diag{l, •••,1,1 + Vbu, 1, • • • , 1). 

aT d —1 —1 


(/ + A^Bf^ lA) 1 = diag{l, •••,!, 1 , • • • , 1 ) 

Noting that A = A^ and using the preceeding fact, it follows that, 

U 


A{I + A^B^^lA)-^A^ = A^{I + A^B^lA)-^A = -^^^d^ag{ei) = Y^^AA^ 

i have 

B^^lA{I + A^B^iA)-^A^B^i 


Therefore we have 


1 


-Bzl.AA'^Br^ 


Now observe that 


whence it follows that 


Bo,ldiag{ei) = (O, 


- JDfA ^,A±A± JDf) ,, 

1 + Vbu 

1 + yfj, Boldiag{ei)diag{ei)'^BQl 


0 , bi, 0 , •••, 0 ) 


B^^lA{I + A^B^^IA)-^A^B^^I = 


Thus for any u G {0,1}™ we have the following identity: 


(E + = (E + - j 


+ Vbi 


-bib A. 
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7.9 Proof of Lemma 5.2 

Proof. Let us write = S + and Bo,i^ = S + 1 /^= 0 - Then 

+ yB i,_i,^—q\ I Bq i, I 

= \Bi..B^U 

Observe that, Bi i, = Bq ^, + Vdiag{ei) . Then using the facts derived in the proof of Lemma 5.1 it 
follows that 

Bi.vBq^ 

=^l I 

7.10 Proof of Lemma 5.3 

Proof. We can write \E + VBi,,^.=o\ and \T, + VBi,^^.=i\ as follows: 

1^ T P.St',i/i=o| = l(^ T z,—i)I| o'm cr^_jj^(S + Pi3j/^[y;=o)(—i,—z)) and 

1^ + P-Si^,i'i = l| = 1(51 + V Bi,^Vi = l){-i-i)\\(^ii + v — + Pi3,y^iz.=o)(-i,-i)) '^(-z)l 

where denotes the (i,i)-th cofactor of a matrix A and cr(-i) is the covariance vector between 

Xi and rest of the Xj's. 

Next observe that 


= I + V B^ldiagiei) 

= / + Vdiag{0, ■■ ■ , 0 , b^{v>), 0 , • • • , 0 ) 

= 1 + Vbii{i>). 


(51 + Pl?i^,zyi=o)(-z,-z) — (51 + Pi?i/,z/i = l)(-i,-i). 


Therefore, 


|51 + P l?t>,z/i=o| 


au + V - ((E + PBj.,i,,=o)(-i,-z)) 

O'iz ((51 + Pi?j/,z/j=o)(—i, —z)) i) 


The using equation (3) and Lemma (5.2) it follows that 


hiiu) = 


— o'f-i) ((51 + Pi3,y.zyi=o)(-z,-z)) cr(- 


n -1 


Proof of the remaining part now follows trivially. 


(7.29) 


8 Discussion 

We consider in this article the problem of simultaneous testing of the individual components of a 
multivaraite normal mean vector when the underlying covariance matrix is assumed to be known. 
We propose a stepwise Bayesian testing procedure assuming a two component point mass mixture 
prior over the unknown means. The proposed Bayesian stepdown procedure is generic and can also 
be applied for non-normal models also. A decision theoretic justification for the newly developed 
testing procedure is established by showing that it possesses a certain desirable convexity property 
essential for the admissibility of a multiple testing procedure. Consistent estimation of the pro¬ 
portion of true alternatives and the variance of the non-zero means is established under certain 
weak correlation structures. An alternative representation of the proposed test statistics has also 
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been established that makes the computation faster. We hope that the present Bayesian stepwise 
procedure can be very useful in many practical situations. 


For the present multiple testing problem, one can also use the optimal Bayes rule assuming an 
additive loss function and report the posterior inclusion probabilities 7r(0i = * = I,-- - ,m. 

However, as already mentioned, evaluting the posterior inclusion probabilities can often be com¬ 
putationally very demanding, specially when one wish to numerically evaluate the optimal Bayes 
risk by replicating the experiment a large number of times, say, 5000 times, when the number of 
hypotheses m is large. It should be stressed at this point that hnding an analytic expression of 
the optimal Bayes risk, at least asymptotically, is indeed a very har d problem to solve, even under 
specihc forms of dependence. In a recent work. [Bogdan et'a! ( 2011 ) show that when X^’s are i.i.d. 
observations from a two component Gaussian mixture distribution, the popular BH method and 
the Bonferroni procedure become asymptotic Bayes optimal under sparsity (ABOS). They also hnd 
conditions under which a multiple testing procedure will become ABOS when test statistics are 
independent. A natural question to ask then is under what conditions a multiple testing procedure, 
such as the BH method or the BSD method, will become ABOS for the present multivariate normal 
mean problem. This is dehnitely one of the most important and challenging open problems in this 
domain so far. Another very important and interesting problem is to investigate more general con¬ 
ditions under which consistent estimation of the proportion of non-nulls is possible. It would be of 
immense theoretical importance to investigate whether such an estimator attains any optimal rate 
of convergence, such as, the minimax rate of convergence, for the present multiple testing problem. 
We hope to address these problems elsewhere in future. 
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